Method and apparatus for multiplying polynomials with a prime number of terms

ABSTRACT

An efficient method and apparatus to compute a product of polynomials of degree n−1 where n is an arbitrary prime is provided. The total number of multiply operations and Arithmetic Logical Unit (ALU) operations to compute the product is minimized through the judicious use of polynomial evaluations at few points to decrease the number of multiplications while using only simple ALU operations.

FIELD

This disclosure relates to polynomial operations and in particular to polynomial multiplication.

BACKGROUND

A polynomial is a mathematical expression of one or more algebraic terms, for example, “a+bx+cx²”, each of which consists of a constant (a, b or c) multiplied by one or more variables (x) raised to a nonnegative integral power. The schoolbook method to multiply two polynomials is to multiply each term of a first polynomial by each term of a second polynomial. For example, a first polynomial of degree 1 with two terms a₁×+a₀ may be multiplied by a second polynomial of degree 1 with two terms b₁x+b₀ by performing four multiply operations and three addition operations to produce a polynomial of degree 2 with three terms as shown below:

(a ₁ x +a ₀)(b ₁ x+b ₀)=a ₁ b ₁ x ²+(a ₀ b ₁ x+a ₁ b ₀ x)+a ₁ b ₁

The number of multiply operations and Arithmetic Logical Unit (ALU) operations increases with the number of terms in the polynomials. For example, using the schoolbook method, the number of multiply operations to multiply two polynomials each having n² terms is n and the number of additions is (n−1)².

BRIEF DESCRIPTION OF THE DRAWINGS

Features of embodiments of the claimed subject matter will become apparent as the following detailed description proceeds, and upon reference to the drawings, in which like numerals depict like parts, and in which:

FIG. 1 is a flowchart illustrating an embodiment of a method to compute the product C(x) of two polynomials A(x) and B(x) each having n terms, where n is a prime number according to the principles of the present invention;

FIG. 2 illustrates the computation of the coefficients of a polynomial C(x) of degree eight using the coefficients of polynomials A(x) and B(x) of degree four; and

FIG. 3 is a block diagram of a system that includes an embodiment of Public Key Encryption (PKE) unit to perform public key encryption using an embodiment of a method to compute the product C(x) of two polynomials A(x) and B(x) each having n terms, where n is a prime number.

Although the following Detailed Description will proceed with reference being made to illustrative embodiments of the claimed subject matter, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art. Accordingly, it is intended that the claimed subject matter be viewed broadly, and be defined only as set forth in the accompanying claims.

DETAILED DESCRIPTION

Polynomial operations such as polynomial multiplication are important in cryptography, for example, in the context of Elliptic curves and for use with other public key encryption algorithms such as Rivest, Shamir, Adleman (RSA).

The Karatsuba algorithm reduces the number of multiply operations compared to the schoolbook method by multiplying two two-term polynomials (A(x)=(a₁x+a₀) and B(x)=(b₁x+b₀)), each having two coefficients ((a₁,a₀) and (b₁b₀)), using three scalar multiplications instead of four multiplications as shown below:

C(x)=(a ₁ x+a ₀)(b ₁ x+b ₀)=a ₁ b ₁ x ²+((a ₀ +a ₁)(b ₀ +b ₁)−a ₀ b ₀ −a ₁ b ₁)+a ₀ b ₀

Thus, four additions and three multiplications are required to compute the result C(x) of multiplying two two-term polynomials using the Karatsuba algorithm. The Karatsuba algorithm may also be used to multiply two three-term polynomials using six scalar multiplications instead of nine multiplications.

Polynomial multiplication may be performed efficiently using the Karatsuba-Ofman algorithm as discussed in “Five, Six and Seven-term Karatsuba-like formulae”, Peter L Montgomery, IEEE Transactions on Computers, Vol. 54, No. 3, Mar. 2005. The Karatsuba-Ofman algorithm performs polynomial multiplications using a sub-quadratic number of base multiply operations at the expense of simpler Arithmetic Logical Unit (ALU) operations. The task of finding efficient formulae for arbitrary term polynomials is extremely hard. Montgomery performed a search to solve 5 and 7 term polynomial multiplications and the result is a Karatsuba-like formula, which when executed, is a Karatsuba algorithm for 5 and 7-term polynomials. However, the exhaustive search method has an exponential run-time and thus cannot be extended beyond seven terms, due to computational infeasibility.

Polynomial multiplication for two polynomials A(x), B(x) of arbitrary degree-d with n=d+1 coefficients may be performed using a one-iteration (non-recursive) Karatsuba algorithm as discussed in “Generalizations of the Karatsuba algorithm for efficient implementation”, A Weimerskirch, C Paar, by:

-   -   (1) Computing D=a_(i)b_(i), for each i=0 to n−1, D=a_(i)b_(i)         then calculating

D _(s,t):=(a _(s) +a _(t))(b _(s) +b _(t)) for each i=1 to 2n−3, and for all s and t with s+t=i and t>s>=0.

-   -   (2) Using these computed values to compute each co-efficient of         C(x).

However, this technique is not optimal in terms of the number of scalar computations.

A prime number is any integer other than 0 or ±1 that is not divisible without remainder by any other integers except ±1 and ± the integer itself. For example, 2, 3, 5, 7, 11, and 13 are prime numbers. An embodiment of the present invention provides a method and apparatus that uses a non-recursive Karatsuba (KA) algorithm to multiply polynomials having an arbitrary prime number of terms with fewer multiplications than prior art methods and that has a better performance than the Montgomery exhaustive search for polynomials having five and seven terms.

In contrast to the Montgomery exhaustive search method which is limited to seven terms due to computational infeasibility, an embodiment of the present invention may be applied to any arbitrary prime number of terms. An embodiment uses Arithmetic Logical Unit (ALU) operations such as addition/subtraction and a single-bit shift in addition to multiplication and has less ALU operations than the Montgomery exhaustive search method for five and seven-term polynomials.

FIG. 1 is a flowchart illustrating an embodiment of a method to compute the product C(x) of two polynomials A(x) and B(x) each having n terms, where n is a prime number according to the principles of the present invention.

The product C(x) may be computed as follows:

C(x) = (a_((n − 1)) ⋅ X^(n − 1) + a_((n − 2)) ⋅ X^(n − 2) + …  a₁ ⋅ X + a₀) * (b_((n − 1)) ⋅ X^(n − 1) + b_((n − 2)) ⋅ X^(n − 2) + …  b₁ ⋅ X + b₀)

-   -   C(x) may be represented as follows:

C(x)=c _((2n−1)) .X ^(2(n−1)) + . . . c ₁ .X+c ₀.

An embodiment of the invention in which both polynomials A(x) and B(x) have five terms, that is, n is equal to 5 (a prime number) and are of degree 4 will be described. In this embodiment, C(x) is computed as follows:

$\begin{matrix} {{{C(x)} = {\left( {{a_{4} \cdot X^{4}} + {a_{3} \cdot X^{3}} + {\ldots \mspace{14mu} {a_{1} \cdot X}} + a_{0}} \right)*}}\mspace{14mu}} \\ {\left( {{b_{4} \cdot X^{4}} + {b_{3} \cdot X^{3}} + {\ldots \mspace{14mu} {b_{1} \cdot X}} + b_{0}} \right)} \\ {= {{c_{8} \cdot X^{8}} + {\ldots \mspace{14mu} {c_{1} \cdot X}} + c_{0}}} \end{matrix}$

FIG. 2 illustrates the computation of the individual coefficients of a polynomial C(x) of degree 8 using the individual coefficients of polynomials A(x) and B(x) of degree 4. As shown in FIG. 2, the computation of the coefficients c0-c8 of C(x) 200 requires the computation of five products using the five coefficients of A(x), that is, (a0-a4) and the five coefficients of B(x), that is, (b0-b4) for which the indices (0-4) are equal, that is, (a0.b0) 202, (a1.b1) 206, (a2.b2) 212, (a3.b3) 214 and (a4.b4) 220.

Returning to FIG. 1, at block 100, the products of coefficients for which indices are equal (a_(i), b_(i)) as shown in FIG. 2, are computed first. Some of these products will be used later to compute some of the coefficients of C(x) using the 2 term Karatsuba algorithm. Processing continues with block 102.

At block 102, the products of the coefficients of A and B that can be computed as a series of 2-term Karatsuba multiplications are computed. The computation of the coefficients of C(x), that is, c0-c8 requires the computation of s=(a_(i).b_(j)+a_(j).b_(i)), where i!=j, as shown in FIG. 2. The number of multiply operations for computing (a_(i).b_(j)+a_(j).b_(i)), where i!=j may be reduced from two to one through the use of a 2-term Karatsuba algorithm, by computing:

s=(a _(i) +a _(j))*(b _(i) +b _(j))−a _(i) .b _(i) −a _(j) .b _(j)

As the products a_(i).b_(i) and a_(j).b_(j) have already been computed at block 100, only one product need be computed, that is, (a_(i)+a_(j))*(b_(i)+b_(j)). For example, the computation of coefficient c7 of C(x) which requires the computation of (a3.b4+a4.b3), that requires the result of two multiply operations (a3.b4), (a4.b3) and an addition. Coefficient c3 of C(x) may be computed as a 2-term Karatsuba Algorithm by computing (a0.b3+a3.b0) with a single multiply operation and computing (a1.b2+a2.b1) with a second multiply operation. A plurality of two-term Karatsuba multiplications are performed using coefficients of A and B for which indices i!=j and i+j is not equal to n or n−1. That is, the following computation is performed:

D _(s,t):=(a _(s) +a _(t))(b _(s) +b _(t))

-   -   for each i=1 to 2n−3, and for all s and t with s+t=i, s+t!=n and         s+t !=(n−1) and t>s>=0.

The coefficients c0-c3 and c6-c8 of C(x) shown in FIG. 2 are computed using the computed 2-term Karatsuba multiplications 204, 208, 215, 210, 218 and the products of same indices 202, 206, 212, 214, 220. Processing continues with block 104.

At block 104, coefficient c4 could be computed using two 2 term Karatsuba multiplications (a0.b4, a4.b0) and (a1.b3, a3.b1) and a pre-computed product (a2.b2). However, this would require two multiply operations to compute coefficient c4 and two multiply operations to compute coefficient c5. The total number of multiply operations to compute C(x) may be reduced by evaluating the polynomial C(x) at 2 points {−1, 1} as shown below in Table 1:

TABLE 1 With  x = 1 $\begin{matrix} {{C(1)} = {{A(1)}*{B(1)}}} \\ {= {\left( {a_{4} + a_{3} + {\ldots \mspace{14mu} a_{1}} + a_{0}} \right)*\left( {b_{4} + b_{3} + {\ldots \mspace{14mu} b_{1}} + b_{0}} \right)}} \\ {= {c_{0} + c_{1} + c_{2} + c_{3} + c_{4} + c_{5} + c_{6} + c_{7} + c_{8}}} \end{matrix}$ With  x = −1 $\begin{matrix} {{C\left( {- 1} \right)} = {{A\left( {- 1} \right)}*{B\left( {- 1} \right)}}} \\ {= {\left( {a_{4} - a_{3} + a_{2} - a_{1} + a_{0}} \right)*{\left( {b_{4} - b_{3} + b_{2} - b_{1} + b_{0}} \right).}}} \\ {= {c_{0} - c_{1} + c_{2} - c_{3} + c_{4} - c_{5} + c_{6} - c_{7} + c_{8}}} \end{matrix}$

Each of the evaluations of C(x) shown in Table 1 requires one multiply operation (product) for a total of 2 multiply operations.

The result of adding C(1) and C(−1) is as shown below:

$\begin{matrix} {{{C(1)} + {C\left( {- 1} \right)}} = {\left( {c_{0} + c_{1} + c_{2} + c_{3} + c_{4} + c_{5} + c_{6} + c_{7} + c_{8}} \right) +}} \\ {{c_{0} - c_{1} + c_{2} - c_{3} + c_{4} - c_{5} + c_{6} - c_{7} + c_{8}}} \\ {= {2\left\lbrack {{c\; 0} + {c\; 2} + {c\; 4} + {c\; 6} + {c\; 8}} \right\rbrack}} \end{matrix}$

As the coefficients c0, c2, c6 and c8 have already been computed as discussed in conjunction with block 100 and block 102, coefficient c4 of C(x) may be computed using these coefficients (c0, c2, c6 and c8) as shown below:

c4={[C(x=1)+C(x=−1)]>>1}−[c0+c2+c6+c8]

Processing continues with block 106.

At block 106, coefficient c5 of C(x) may be computed in a similar manner by subtracting C(1) and C(−1) as shown below:

$\begin{matrix} {{{C(1)} - {C\left( {- 1} \right)}} = {\left( {c_{0} + c_{1} + c_{2} + c_{3} + c_{4} + c_{5} + c_{6} + c_{7} + c_{8}} \right) -}} \\ {{c_{0} - c_{1} + c_{2} - c_{3} + c_{4} - c_{5} + c_{6} - c_{7} + c_{8}}} \\ {= {2\left\lbrack {{c\; 1} + {c\; 3} + {c\; 5} + {c\; 7}} \right\rbrack}} \end{matrix}$

Thus, c5 may be computed using these computed coefficients as shown below:

c5={[C(1)−C(−1)]>>1}−[c1+c3+c7]

Thus, only two multiply operations are used to compute coefficients c4 and c5, one to compute C(−1) and the other to compute C(1). The further calculations to compute coefficients c4 and c5 using C(−1) and C(1) only require simple addition/subtraction or right-shift-logical-by-one operations.

The total number of multiply operations is thus 13 to compute all coefficients of C(x), that is, c0-c8 in contrast to the 15 required using the one-iteration (non-recursive) Karatsuba algorithm. Referring to FIG. 2, five multiply operations to compute (a0.b0) 202, (a1.b1) 206, (a2.b2) 212, (a3.b3) 214 and (a4.b4) 220, six multiply operations to compute (a0b1, b1a0) 204, (a0b2, b2a0) 208, (a1b2, a2b1) 215, (a0b3, b3a0) 210, (a2b4, b4,a2) 216, (a3b4, b4, a3) 248 and two multiply operations to compute C(1) and C(−1). The number of multiply operations for multiplying two 5-term polynomials of degree 4 is equivalent to the number of multiply operations required for Montgomery exhaustive search for the total number of multiply operations and is better in terms of number of the total number of ALU operations.

An embodiment has been described for multiplying two 5 term polynomials (A(x) with coefficients a₄-a₀, B(x) with coefficients b₄-b₀) to produce a nine term polynomial result (c₈-c_(o)) of degree 8. However, the invention is not limited to multiplication of 5 term polynomials of degree 4. An embodiment may use polynomials having any prime number of terms n.

All coefficients of the C(x) result of multiplying two n-term polynomials where n is prime, except for the (n−1)th coefficient and the nth coefficient may be computed as discussed in conjunction with blocks 100 and 102 in FIG. 1. Then, C(x) is evaluated at points {−1, 1} and the linear system of equations solved to derive the (n−1)th and nth coefficients which only requires two multiply operations. Thus, the number of multiply operations is n(n+1)/2−[n−3] for multiplying two n- term polynomials of degree n−1.

Thus, the number of multiply operations is 13 for a polynomial of degree 4, that is, with five terms (n=5), which is the same number of multiply operations as required for the exhaustive search Montgomery technique and less (by two) than the number used by the one-iteration (non-recursive) Karatsuba algorithm technique. As there is no limit to the number of terms in the polynomials to be multiplied, an embodiment of the invention may be used for a prime number of terms of 11 and greater, that is, cases for which the exhaustive search Montgomery technique cannot be used. Also, for prime terms of 11 and greater, the number of multiply operations is 24 in contrast to the 28 multiply operations required by the one-iteration (non-recursive) Karatsuba algorithm (1−KA) technique.

An embodiment of the invention pertains to an efficient method and apparatus to compute the product of two polynomials having an arbitrary prime number of terms. The total number of multiply operations is less than used by the one iteration non-recursive Karatsuba algorithm and is similar to the number of multiply operations used by the exhaustive search technique discussed by Montgomery for some prime number of terms.

In contrast to the exhaustive search technique of Montgomery which is limited to 7 terms due to computational infeasibility, an embodiment of the invention may be applied to polynomials having any arbitrary prime number of terms. An embodiment of the invention uses only Arithmetic Logical Unit (ALU) operations such as addition/subtraction and single-bit shift operations in addition to multiply operations and very few total ALU operations compared to exhaustive search technique of Montgomery through the judicious use of polynomial evaluations at few points to decrease the number of multiplications while using only simple ALU operations.

The performance of an embodiment of the invention is better than the exhaustive Montgomery search for multiplication of 5-term and 7-term polynomials even though the number of multiply operations are the same because there are less ALU operations (addition/subtraction and shift).

An embodiment has been described for integer-fields. However, the invention is not limited to integer-fields, for example, an embodiment may use Galois Fields (GF(2^(n))) instead of integer-fields.

FIG. 3 is a block diagram of a system 100 that includes an embodiment of Public Key Encryption (PKE) unit 108 to perform public key encryption using an embodiment of a method to compute the product C(x) of two polynomials A(x) and B(x) each having n terms, where n is a prime number.

The system 100 includes a processor 301, a Memory Controller Hub (MCH) 302 and an Input/Output (I/O) Controller Hub (ICH) 304. The MCH 302 includes a memory controller 306 that controls communication between the processor 301 and memory 310. The processor 301 and MCH 302 communicate over a system bus 316.

The processor 301 may be any one of a plurality of processors such as a single core Intel® Pentium IV® processor, a single core Intel Celeron processor, an Intel® XScale processor or a multi-core processor such as Intel® Pentium D, Intel® Xeon® processor, or Intel® Core® Duo processor or any other type of processor.

The memory 310 may be Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Synchronized Dynamic Random Access Memory (SDRAM), Double Data Rate 2 (DDR2) RAM or Rambus Dynamic Random Access Memory (RDRAM) or any other type of memory.

The ICH 304 may be coupled to the MCH 302 using a high speed chip-to-chip interconnect 314 such as Direct Media Interface (DMI). DMI supports 2 Gigabit/second concurrent transfer rates via two unidirectional lanes.

The ICH 304 may include a storage I/O controller 320 for controlling communication with at least one storage device 312 coupled to the ICH 304. The storage device 312 may be, for example, a disk drive, Digital Video Disk (DVD) drive, Compact Disk (CD) drive, Redundant Array of Independent Disks (RAID), tape drive or other storage device. The ICH 304 may communicate with the storage device 312 over a storage protocol interconnect 318 using a serial storage protocol such as, Serial Attached Small Computer System Interface (SAS) or Serial Advanced Technology Attachment (SATA).

In an embodiment, the Public Key Encryption (PKE) unit 108 includes a state machine 356, an Arithmetic Logical Unit (ALU) 352 and a multiplier 354 to perform multiplication of polynomials as discussed in conjunction with FIGS. 1 and 2. The ALU 352 performs integer operations such as addition and subtraction and also performs bit-shifting operations, for example, shifting or rotating by a specified number of bits to the left or right. The state machine 356 controls the sequence of operations performed by the ALU 352 and the multiplier 354 to perform the polynomial multiplication.

In another embodiment, the polynomial multiplication as discussed in conjunction with FIGS. 1 and 2 may be performed by CPU 316 executing a polynomial multiplication library function 350 that may be stored in memory 310.

It will be apparent to those of ordinary skill in the art that methods involved in embodiments of the present invention may be embodied in a computer program product that includes a computer usable medium. For example, such a computer usable medium may consist of a read only memory device, such as a Compact Disk Read Only Memory (CD ROM) disk or conventional ROM devices, or a computer diskette, having a computer readable program code stored thereon.

While embodiments of the invention have been particularly shown and described with references to embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of embodiments of the invention encompassed by the appended claims. 

1. A method comprising: computing a product C of a polynomial A having n coefficients and a polynomial B having n coefficients, C having (2n−1) coefficients, where n is prime by: computing products of coefficients of A and coefficients of B for which indices (i, j) are equal; computing products of coefficients of A and B for which indices (i, j) are not equal and the sum of the indices is not equal to n or n−1; using the computed products to compute coefficients of C; and computing the nth and (n−1)th coefficients of C using a result of evaluating C at {1, −1} and the computed coefficients of C.
 2. The method of claim 1, wherein n is greater than three.
 3. The method of claim 1, wherein the nth coefficient of C is computed using the result of subtracting the evaluation of C at 1 from the evaluation of C at −1.
 4. The method of claim 1, wherein the (n−1)th coefficient of C is computed using the result of adding the evaluation of C at 1 and the evaluation of C at −1.
 5. An apparatus comprising: a polynomial multiplier to compute a product C of a polynomial A having n coefficients and a polynomial B having n coefficients, C having (2n−1) coefficients, where n is prime, the polynomial multiplier to compute products of coefficients of A and coefficients of B for which indices (i, j) are equal, compute products of coefficients of A and B for which indices (i, j) are not equal and the sum of the indices is not equal to n or n−1, to use the computed products to compute coefficients of C and to compute the nth and (n−1)th coefficients of C using a result of evaluating C at {1, −1} and the computed coefficients of C.
 6. The apparatus of claim 5, wherein n is greater than three.
 7. The apparatus of claim 5, wherein the nth coefficient of C is computed using the result of subtracting the evaluation of C at 1 from the evaluation of C at −1.
 8. The apparatus of claim 5, wherein the (n−1)th coefficient of C is computed using the result of adding the evaluation of C at 1 and the evaluation of C at −1.
 9. An article including a machine-accessible medium having associated information, wherein the information, when accessed, results in a machine performing: computing a product C of a polynomial A having n coefficients and a polynomial B having n coefficients, C having (2n−1) coefficients, where n is prime by: computing products of coefficients of A and coefficients of B for which indices (i, j) are equal; computing products of coefficients of A and B for which indices (i, j) are not equal and the sum of the indices is not equal to n or n−1; using the computed products to compute coefficients of C; and computing the nth and (n−1)th coefficients of C using a result of evaluating C at {1, −1} and the computed coefficients of C.
 10. The article of claim 9, wherein n is greater than three.
 11. The article of claim 9, wherein the nth coefficient of C is computed using the result of subtracting the evaluation of C at 1 from the evaluation of C at −1.
 12. The article of claim 9, wherein the (n−1)th coefficient of C is computed using the result of adding the evaluation of C at 1 and the evaluation of C at −1.
 13. A system comprising: a dynamic random access memory to store a polynomial A having n coefficients and a polynomial B having n coefficients; and a polynomial multiplier to compute a product C of A and B, C having (2n−1) coefficients, where n is prime, the polynomial multiplier to compute products of coefficients of A and coefficients of B for which indices (i, j) are equal, compute products of coefficients of A and B for which indices (i, j) are not equal and the sum of the indices is not equal to n or n−1, to use the computed products to compute coefficients of C and to compute the nth and (n−1)th coefficients of C using a result of evaluating C at {1, −1} and the computed coefficients of C.
 14. The system of claim 13, wherein n is greater than three.
 15. The system of claim 13, wherein the nth coefficient of C is computed using the result of subtracting the evaluation of C at 1 from the evaluation of C at −1.
 16. The system of claim 13, wherein the (n−1)th coefficient of C is computed using the result of adding the evaluation of C at 1 and the evaluation of C at −1. 